Compositional Neural Network Language Models for Agglutinative Languages

نویسندگان

Ebru Arisoy

Murat Saraclar

چکیده

Continuous space language models (CSLMs) have been proven to be successful in speech recognition. With proper training of the word embeddings, words that are semantically or syntactically related are expected to be mapped to nearby locations in the continuous space. In agglutinative languages, words are made up of concatenation of stems and suffixes and, as a result, compositional modeling is important. However, when trained on word tokens, CSLMs do not explicitly consider this structure. In this paper, we explore compositional modeling of stems and suffixes in a long short-term memory neural network language model. Our proposed models jointly learn distributed representations for stems and endings (concatenation of suffixes) and predict the probability for stem and ending sequences. Experiments on the Turkish Broadcast news transcription task show that further gains on top of a state-of-theart stem-ending-based n-gram language model can be obtained with the proposed models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Joint PoS Tagging and Stemming for Agglutinative Languages

The number of word forms in agglutinative languages is theoretically infinite and this variety in word forms introduces sparsity in many natural language processing tasks. Part-of-speech tagging (PoS tagging) is one of these tasks that often suffers from sparsity. In this paper, we present an unsupervised Bayesian model using Hidden Markov Models (HMMs) for joint PoS tagging and stemming for ag...

متن کامل

Document Categorization with Modified Statistical Language Models for Agglutinative Languages

In this paper, we investigate the document categorization task with statistical language models. Our study mainly focuses on categorization of documents in agglutinative languages. Due to the productive morphology of agglutinative languages, the number of word forms encountered in naturally occurring text is very large. From the language modeling perspective, a large vocabulary results in serio...

متن کامل

Syllable-level Neural Language Model for Agglutinative Language

Language models for agglutinative languages have always been hindered in past due to myriad of agglutinations possible to any given word through various affixes.We propose a method to diminish the problem of out-of-vocabulary words by introducing an embedding derived from syllables and morphemes which leverages the agglutinative property. Our model outperforms character-level embedding in perpl...

متن کامل

Learning grammars with recurrent neural networks

Most of language acquisition models that have been constructed so far are based on traditional AI approaches. On the other hand, artificial neural networks (ANNs), contrasting with traditional AI approaches, have many great abilities such as ability to learn, generalization capability and robustness. But they are poor in representing compositional structures and manipulating them, and are consi...

متن کامل

A Syllable-based Technique for Word Embeddings of Korean Words

Word embedding has become a fundamental component to many NLP tasks such as named entity recognition and machine translation. However, popular models that learn such embeddings are unaware of the morphology of words, so it is not directly applicable to highly agglutinative languages such as Korean. We propose a syllable-based learning model for Korean using a convolutional neural network, in wh...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Compositional Neural Network Language Models for Agglutinative Languages

نویسندگان

چکیده

منابع مشابه

Joint PoS Tagging and Stemming for Agglutinative Languages

Document Categorization with Modified Statistical Language Models for Agglutinative Languages

Syllable-level Neural Language Model for Agglutinative Language

Learning grammars with recurrent neural networks

A Syllable-based Technique for Word Embeddings of Korean Words

عنوان ژورنال:

اشتراک گذاری